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ABSTRACT 

We examine the three-dimensional clustering of C IV absorption-line systems, using 
an extensive catalog of QSO heavy-element absorbers drawn from the literature. We 
measure clustering by a volume-weighted integral of the correlation function called 
the reduced second-moment measure, and include information from both along and 
across QSO lines of sight, thus enabling a full determination of the three-dimensional 
clustering of absorbers, as well as a comparison of line- and cross-line-of-sight clustering 
properties. Here we present the three-dimensional reduced second-moment estimator 
for a three-dimensional point process probed by one-dimensional lines of sight, and 
apply our algorithm to a sample of 345 C IV absorbers with median redshift (z) = 2.2, 
drawn from the spectra of 276 QSOs. We confirm the existence of significant clustering 
on comoving scales up to 100 h~^ Mpc {qo = 0.5), and find that the additional cross- 
line-of-sight information strengthens the evidence for clustering on scales from 100 
/i"^ Mpc to 150 /i"^ Mpc. There is no evidence of absorber clustering along or across 
lines of sight for scales from 150 h^^ Mpc to 300 h~^ Mpc. We show that with a 
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300-times larger catalog, such as that to be compiled by the Sloan Digital Sky Survey 
(100,000 QSOs), use of the full three-dimensional estimator and cross-line-of-sight 
information will substantially increase clustering sensitivity. We find that standard 
errors are reduced by a factor 2 to 20 on scales of 30 to 200 Mpc, in addition to the 
factor of \/300 reduction from the larger sample size, effectively increasing the sample 
size by an extra factor of 4 to 400 at large distances. 

Subject headings: cosmology:obscrvations — intergalactic medium — large-scale struc- 
ture of universe — methods:statistical — quasars:absorption lines 

1. Introduction 

In a previous series of investigations (Vanden Berk et al. 1996; Quashnock, Vandcn Berk, & 
York 1996; Quashnock & Vanden Berk 1998), the clustering properties of C IV and Mg II absorbers 
have been investigated, using an extensive catalog of heavy-element absorption-line systems drawn 
from the literature.^ These authors used a one-dimensional correlation analysis — one confined to 
pairs of absorbers along the same QSO line of sight — and found evidence for strong and evolving 
clustering on small scales (1-16 h^^ Mpc), as well as for superclustering on scales as large as 50- 
100 h^^ Mpc. Together, these investigations suggest that these strong absorbers, with median rest 
equivalent width (W) = 0.4 A for C IV (Quashnock & Vanden Berk 1998), are biased tracers of 
the higher density regions of space, and that agglomerations of absorbers along a line of sight are 
indicators of clusters and super clusters. 

Recently, Quashnock & Stein (1999) used a new measure of clustering, called the reduced 
second moment measure or K{r) (Ripley 1988; Baddeley 1998), which directly measures the mean 
overdensity of absorbers on scales ^ r. While closely related to other second-order measures of 
clustering, such as the correlation function or the power spectrum, the reduced second moment 
measure nevertheless has a number of advantageous statistical properties, and has recently been 
studied by astrophysicists (Martinez et al. 1998; Quashnock & Stein 1999; Stein, Quashnock, &; 
Loh 2000). Quashnock &; Stein (1999) found significant evidence for clustering of C IV absorbers on 
scales of 20-100 Mpc, with marginal evidence on 100-200 Mpc scales, again suggesting that 
the absorbers show superclustering much like what is seen locally in the distribution of galaxies. 

However, their analysis also was confined to pairs of absorbers along the same line of sight, 
and hence did not include cross-line-of-sight information valuable for a full determination of the 
three-dimensional clustering properties of absorbers, which we are ultimately interested in finding. 
Indeed, Richards et al. (1999) have claimed that there is evidence of some significant contamination 



^Contact D. E. Vanden Berk (danvb@fnal.gov) for the latest version of the catalog; see York et al. (1991) for an 
earlier version. 
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of true intervening systems along the line of sight by absorbers that are actually physically associ- 
ated with the QSO, and that such contamination may extend to relative velocities as great as 75000 
km s^^. This means that such a contamination could be present as well in the superclustering signal 
found by Quashnock & Stein (1999). In addition, on smaller scales of a few megaparsecs, Crotts, 
Buries, k, Tytler (1997) have claimed that the clustering of C IV systems across adjacent lines of 
sight is significantly weaker than along a line of sight and have questioned whether the correlations 
are due to velocity dispersion in associated systems rather than intrinsic spatial clustering. The 
number of adjacent lines analyzed by Crotts, Buries, & Tytler (1997) was small, however. 

It is thus essential to study correlations both along and across QSO lines of sight, to enable a 
complete determination of the three-dimensional clustering of absorbers, as well as a comparison of 
line- and cross-line-of-sight clustering properties (and ultimately statistically discriminate between 
intrinsic and intervening absorbers). What is required is a method that measures spatial clustering 
of absorbers — which are nonetheless confined to QSO lines of sight — and can contrast clustering 
along and across lines of sight. 

Here we present the three-dimensional estimator for the reduced second moment measure of a 
three-dimensional point process probed by one-dimensional lines of sight. We apply our algorithm 
to a sample of 345 C IV absorbers with median redshift (z) = 2.2, drawn from the spectra of 
276 QSOs in the aforementioned catalog of Vanden Berk et al., with the goal of determining the 
clustering of absorbers on very large scales and seeing if the signal found by Quashnock & Stein 
(1999) remains when including cross-line-of-sight information, or, more compellingly, using only 
cross-line-of-sight information. 

The outline of the paper is as follows: In §2 we define the three-dimensional reduced second 
moment measure, compare it to its one-dimensional analog, and outline our method of estimation. 
In §3 we apply our methodology to the above sample and present our results. In §4 we interpret the 
results and discuss how the full three-dimensional estimator and cross-line-of-sight information 
will improve our ability to measure clustering with the absorber sample from the Sloan Digital Sky 
Survey (100,000 QSOs). Finally, we conclude in §5 and present the details of the three-dimensional 
reduced second moment estimator in the Appendix. 

2. The Reduced Second Moment Measure 

Here we assume that the clustering of absorbers is both statistically homogeneous and sta- 
tionary (does not depend on cosmic epoch or redshift z) when examined in comoving coordinates. 
The latter assumption is likely not to be strictly true, since growth of the correlation function with 
decreasing redshift has been detected, at least on smaller scales of 1-16 Mpc(Quashnock & 
Vanden Berk 1998). Nevertheless, our results here can be thought of as averages for the absorber 
sample as a whole, which has a characteristic redshift given by the median (z) = 2.2. It is possible 
to extend our treatment and examine the evolution of the clustering with redshift (see §4 below 
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regarding the Sloan Digital Sky Survey), but we have not done so here, largely because of the 
limited size of our sample. We follow the usual convention and take the Hubble constant, Hq, to 
be 100 h km s"^ Mpc"^ and take qo = 0.5 and A = 0. 

2.1. Definitions 

We treat the distribution of absorbers as a point process in i/iree-dimensional space rather than 
as a one-dimensional process on the lines of sight, and use the reduced second moment to describe 
the three-dimensional clustering. Otherwise, we closely follow the treatment given in Quashnock 
Sz Stein (1999) — where one-dimensional clustering is described — and use the same symbol K 
for the three-dimensional reduced second moment. 

Let A be the mean number of absorbers per unit comoving volume. The absorbers have some 
physical size, perhaps of order 100 kpc or so (Churchill, Steidel, Sz Vogt 1996), so that there is a 
finite probability of intersection between an absorber and the QSO lines of sight. For simplicity, we 
assume the absorbers are balls of identical radius d. Our treatment is accurate on scales of a few 
Mpc and greater, which are much larger than the physical size of the absorbers. The clustering we 
seek to measure is that of the centers of the absorbers. 

Neither our methods nor results depend on specifying d (see Appendix). Since absorbers do 
vary in size, mass and column density, and their clustering likely depends on these quantities (Cris- 
tiani et al. 1997; D'Odorico et al. 1998), our results cannot be directly interpreted as a quantitative 
measure of the clustering of mass. Nevertheless, we expect our results to be qualitatively correct 
to the extent that if our estimates of K show evidence of clustering on some spatial scale, matter 
should also show clustering on this same scale, especially if the latter is quite large. Furthermore, 
our results are neither more nor less dependent on the assumption of equal absorber size as those 
in the one-dimensional analyses presented in Quashnock & Vanden Berk (1998) or Quashnock & 
Stein (1999); rather, it is just harder to ignore the fact that absorbers must have a finite cross 
section and volume when doing a three-dimensional analysis based on intersections of absorbers 
with lines of sight. 

The reduced second moment measure, K{r), is the conditional expectation, or average — given 
that there is an absorber center at a; — of the number of absorbers (other than the one at x itself), 
N(x,r), whose centers are within a comoving distance r of x, normalized by A: 

K{r) = ^ E [N{x, r) \ absorber at x] . (1) 
A 

Because of our assumption of homogeneity, the expected number of absorbers in equation (1) does 
not depend on x. With go = 0.5 and A = 0, the comoving distance r between two absorbers at 
redshifts zi and Z2 is r = 2c/ Hq x |1/vT+^— 1 / V 1 + -^2 | • 

In terms of the two-point correlation function ^(r) (Peebles 1980, 1993), the reduced second 
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moment measure is given by 

nr 

Kir) = 4tt du [1 + ^{u)] . (2) 

Jo 

If no correlations are present, then K{r) = l^rr^. Simply put, in this case the number of surrounding 
absorber centers within distance r of x would not depend on the fact that there is an absorber center 
at X, and would simply be equal to |7rr^A. The quantity K{r)/{^Trr^) = 1 + p(r) is then a measure 
of the relative mean density of absorbers around other absorbers, averaged over scales less than 
r. Thus the reduced second moment measure is essentially a volume-weighted integral of the 
correlation function, whereas in the one-dimensional treatment of Quashnock & Stein (1999) it 
is a distance-weighted integral. Even if one does not use across-line-of-sight information, it is 
arguably more natural to study the three-dimensional reduced second moment measure. 

The relative mean over-density, p{r), can be written in terms of the power spectrum, P{k), 
the Fourier transform of the correlation function ^(r), or equivalently, in terms of the dimensionless 
power per logarithmic wavenumber, A^(A;) = k^P{k) /{2Tr'^): 

1'°° flk 

p{r) = ^A\k)W{kr), (3) 

where W{kr) = 3 (sin(A;r) — krcos{kr)) /{kr)^ is the window function for a top hat (hard sphere). 

Thus the reduced second moment measure, K{r), is closely related to other second-order 
measures such as the correlation function or the power spectrum, and it directly measures the 

mean over-density of absorbers on scales less than r. However, it has a number of distinct and 
desirable statistical properties which have been presented and discussed elsewhere (Quashnock &; 
Stein 1999; Stein, Quashnock, k, Loh 2000). 



2.2. Estimating K{r) 

Here we outline our method of estimating K, taking into account all absorber pairs, deferring 
the complete derivation of the estimators to the Appendix. We construct two estimators for K{r), 
the first, -^||(r) (eq. [A7]), using only absorber pairs on the same line of sight, and the second, 
(sq- [A13]), using only absorber pairs from different QSO lines of sight.^ Both of these 
are estimators for the same reduced second moment measure K{r) defined in equation (1); to the 
extent that these estimators agree, they provide evidence that any clustering found is not due to 
absorbers associated with the QSOs (see §1). Furthermore, we can combine the along- and across- 
line-of-sight information to obtain an overall estimator K{r) (eq. [A14]), which should be more 
accurate than either K\\{r) or Kj^{r), since we use all of the available information. 



^Since there are no lines of sight in the sample that are within ro = 0.5 h^^ Mpc of each other, and since K{r) 
is a measure of integrated correlation, it is not possible to compute /■Cx(ro) directly, using only across-line-of-sight 
information. However, we can sensibly consider K^_{r) — K±{ro) for r greater than the minimum distance ro between 
lines of sight (see Appendix). 
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In order to estimate K{r) defined in equation (1), one needs to estimate the average number 
of neighbors within a distance r of a typical absorber. To understand the problems associated with 
estimating this quantity, it is helpful to consider the simpler setting of observing a point process in 
a single contiguous region of space. One possible way to estimate K{r) for a point process observed 
in a window (or sample region) A with volume a is, for each point in A, count how many other 
points in A are within r of it, sum these counts and then divide by an appropriate quantity that 
cancels out the effect of the overall intensity of the process. Ripley (1988) calls such an estimator 
the "naive" estimator. The problem with this estimator is that it tends to underestimate K(r) 
because for a point within distance r of a boundary of A, one may not see all of the points of the 
process that are within r of it (see Figure 1). 

There are a number of methods of correcting for this "edge effect" (Ripley 1988; Baddeley 1998; 
Stein 1993). In this work, we use the isotropic correction (Ripley 1988), which is computationally 
well-suited to the setting of a process observed along lines of sight. To describe this correction, 
consider the two-dimensional setting pictured in Figure 1. For a point at x E A, ii another point 
y E A is within distance r of x, then instead of giving this event a weight of 1 as in the naive 
estimator, it is given weight w{x, \x — y\) equal to the reciprocal of the fraction a/27r of the circle of 
radius |x — y| that is contained within A (see Figure 1). As with all of the various edge-correction 
methods, we then have 



where 1{.} is the indicator function, which is unity if the condition in brackets is true and zero 
otherwise. To estimate K{r), we divide this sum by some estimate of X^a. Denoting by N the total 
number of points observed in A, we will use A''(A'' — l)/a as our estimator of A^a. 

Even recognizing that absorbers have a finite volume, we get to observe absorbers in almost 
none of the ball of radius r around any absorber, when estimating the three-dimensional K function. 
Thus, whereas with observations in a single contiguous region, w{x, \x — y\) often equals 1 (i.e., 
when X is not within |x — y| of the boundary of A), for an absorber catalog observed along lines of 
sight, w(x,\x — y\) will always be much bigger than 1. The exact form of the weight function is given 
in the Appendix. As one should expect, the weight is inversely proportional to the cross section 
of the absorbers, or cquivalently, to o(^. Fortunately, our estimator for will also be inversely 
proportional to cP, so the factor of (f cancels when estimating K{r). 

Quashnock &; Stein (1999) used what is known as the rigid-motion estimator to correct for 
edge effects when estimating the one-dimensional reduced second moment function. To apply the 
rigid-motion method to an estimate that uses across-line-of-sight information, for every observed 
distance between pairs of absorbers less than the maximum distance at which we wish to estimate 
K, we would have to apply a three-dimensional rigid motion to the lines of sight, calculate the 
amount of overlap between the old and new set of lines, and then average this amount over all 
possible directions. It is not clear how one could do this accurately in practice. Since we have 



E 




(4) 
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no evidence regarding which estimator is statistically superior in the present setting, we use the 
computationally much simpler isotropic estimator. 

3. Results 

We have used equation (A7) and equation (A14) to estimate the reduced second moment 
measure, K{r), for 276 QSO lines of sight, obtained from the Vanden Berk et al. catalog. A total 
of 345 C IV absorbers have been selected from this heterogeneous catalog, using selection criteria 
(Quashnock, Vanden Berk, & York 1996; Quashnock &; Vanden Berk 1998) designed to obtain as 
homogeneous a data set as possible. We refer the reader to these papers for a detailed description 
of the selection criteria. 

Figure 2 shows both K\\{r) {dashed line) and K{r) {solid line), divided by their Poisson expec- 
tation value, |7rr^, for the entire C IV absorber catalog. These resultant quantities have expectation 

value of very nearly unity if there is no clustering of absorbers (see the Appendix). Figure 2 shows 
that the two estimates agree very well (within their estimated errors, see below) over all distances 
r from 5 to 300 h^^ Mpc, with K\\{r) just slightly larger than K{r) between 30 and 140 Mpc. 

Note that, along the same line of sight, the number of absorber pairs separated by very large 
distances is small, because of the finite comoving length of the lines of sight (the median length is 
350 Mpc [Quashnock &: Stein 1999]); thus, in Figure 2, for distances r of 170 Mpc and 
greater, K^^{r) is noticeably less smooth than K{r). Examination of the numbers of absorber pairs 
along and across lines of sight indicates that it is at such distances that the across line of sight 
information dominates the total information available. In Table 1, we show the number of absorber 
pairs in the data set, for pairs along and across lines of sight, as a function of pair separation r, 
in 10 Mpc bins. We also show the cumulative number of pairs for separations < r. For pair 
separations r > 170 h"^ Mpc, there are more additional absorber pairs across different lines of sight 
than along the same line of sight, whereas for pair separations r < 100 Mpc, the opposite is 
true. These two numbers delineate the regimes where, in the Vanden Berk et al. catalog, clustering 
information arises primarily from pairs across and along lines of sight, respectively. In particular. 
Table 1 shows that the sample is too sparse to significantly compare clustering along and across 
lines of sight on scales of less than 100 h^^ Mpc. 

If the centers of absorbers form a homogeneous Poisson process in three dimensions, i.e., if they 
are unclustered, then their intersections with the lines of sight form independent one-dimensional 
Poisson processes, provided their size d is sufficiently small compared to the scales of interest (see 
Appendix). Thus it is straightforward to simulate the distribution of A"||(r) and K{r) under the 
assumption that the C IV absorbers are unclustered. 

In Figure 3, we show the 95% region of variation about the expectation value of unity of, 
respectively, -f^||(r) {dashed line) and K{r) {solid line), divided by their Poisson expectation value, 
|7rr^, for 10,000 simulated data sets of unclustered absorbers with the same arrangement of lines 
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and average number of absorbers as the Vanden Berk et al. catalog. Their averages {dotted lines) 
are very near the true value of unity, indicating that both estimators are very nearly unbiased in 
this case. We find that the two estimators have essentially the same region of variation on scales up 
to ~ 150 Mpc; beyond that, the range is smaller for K(r), reflecting the additional information 
contributed by pairs of absorbers across different lines of sight. 

To properly interpret the results in Figure 2, one needs some measure of the uncertanties in the 
estimates. Quashnock & Stein (1999) obtained approximate confidence intervals by bootstrapping, 
or resampling, lines of sight. Such a procedure makes no use of the relative locations of the lines 
of sight and is nonsensical when applied to an estimator using across-line-of-sight information. A 
bootstrapping procedure based on resampling regions of the sky would be more appropriate in the 
present setting, but the problems of handling edge effects and the uneven spatial distribution of 
lines of sight complicate matters, so it is unclear how well such a procedure would work. 

To provide a rough idea as to the uncertainty of our estimators of K{r), we use a crude but 
simple approach. We divide the sky into eight regions, each containing nearly the same total length 
of lines of sight. We then compute the sample standard deviation of the eight estimators of K{r) 
in the eight regions, assume that the standard deviation a for the estimator based on all of the 
data will be smaller by a factor of \/8 and then use the overall estimate plus or minus 2a as our 
confidence interval. 



It is clear that Figure 2 shows strong evidence of clustering on scales up to 100 h Mpc 
(go = 0.5), and possibly beyond. In addition, i^||(r) and K{r) essentially agree on the magnitude 
of the reduced second moment measure up to this distance. This is to be expected, since, as is 
clear from Table 1, there are very few additional pairs of absorbers coming from different lines of 
sight on these scales. Thus, the sample is too sparse to significantly compare clustering along and 
across lines of sight on scales of less than 100 h"^ Mpc. 

On scales greater than 100 Mpc, however, there are significantly more additional pairs of 
absorbers from different lines of sight, and it becomes possible to compare their clustering along and 
across lines of sight. Since K{r) is an integrated measure of clustering on scales from zero to r, it is 
necessary to look at estimates of differential quantities like K{r2) ~ K{ri) (with r2 > ri) in order 
to examine clustering on scales strictly between ri and r2- We have investigated the significance of 
clustering scales greater than 100 h"^ Mpc by examining the quantity 
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Discussion 
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The quantity ^(ri,r2) is the estimated average volume-weighted correlation function on scales be- 
tween ri and r2- If the latter two are reasonably close to each other, then ^(ri, r2) is an approximate 
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measure of the correlation function ^(r) on the average scale r = (ri + r2)/2. Similarly, we define 
the quantities ^||(ri,r2) and (,±{ri,r2) by replacing K in equation (5) above by i^y and K±, re- 
spectively. These two quantities are estimates of the average volume-weighted correlation function 
that use only along- or cross-line-of-sight information, respectively. We have examined clustering 
on scales r greater than 100 Mpc by computing ^, ^y, and for a sliding window that is 50 
h"^ Mpc wide and centered on r. 

Figure 4 {solid lines) shows all three quantities, ^(r — 25, r + 25) {top panel), ^±{r — 25, r + 25) 
{middle panel), and ^||(r — 25,r + 25) {lower panel), for scales r between 100 h"^ Mpc and 300 

Mpc. Their approximate 95% regions of variation (again estimated by dividing the QSO sample 
into eight subsamples corresponding to eight different regions of the sky) are also shown {dashed 
lines). All three curves are quite similar, showing evidence of clustering on scales r between 100 

Mpc and 150 Mpc, and all agree with each other within their approximate confidence 
regions. 

What is particularly striking, however, is that ^± (computed with pairs coming from different 
lines of sight) essentially agrees within the errors with (computed with pairs coming from the 
same line of sight); thus, the additional cross-line-of-sight information strengthens the evidence 
for clustering on scales from 100 h"^ Mpc to 150 Mpc. Although the evidence for clustering 

across lines of sight is, on its own, only marginally significant, it is consistent with the amplitude 
and scale of clustering of absorbers along lines of sight; indeed, if anything it hints at being even 
stronger on these scales. 

Such clustering on 100 h^^ Mpc to 150 h^^ Mpc scales had been hinted at in the one- 
dimensional work of Quashnock &; Stein (1999), and has been confirmed by the three-dimensional 
analysis here. This argues against claims that all of the apparent line-of-sight clustering on these 
scales is due to significant contamination along the line of sight by absorbers that are actually 
physically associated with the QSO (see §1). Figure 4 shows no evidence on clustering on scales 
between 150 /i"^ Mpc and 300 Mpc, using any of the three estimators; on scales greater than 
150 h~^ Mpc, the absorbers appear to be distributed in a manner that is consistent with isotropy. 
Note that beyond 200 h~^ Mpc, ^ has appreciably smaller estimated variability than ^y; this shows 
how using the full three-dimensional estimator K{r) can improve the measurement of clustering 
on very large scales, even for this modest-sized catalog. 

Of course, the lines of sight in the Vanden Berk et al. absorber catalog are rather sparse, and 
there are still only 345 lines that were analyzed here. The limited size of that catalog, as well as 
its heterogeneity, precludes a final, strong statement of the statistical significance and amplitude 
of the clustering on scales between 100 Mpc and 150 h"^ Mpc. 

As soon as data are available, we will undertake a new effort at analyzing the clustering of 

heavy-element absorbers in the Sloan Digital Sky Survey (hereafter SDSS), now underway (Mar- 
gon 1999). The SDSS QSO Absorption-Line Catalog (hereafter the SDSS Catalog) will include 
heavy-element absorption-line systems found in the spectra of about 100,000 QSOs, with absorbers 
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ranging in redshift from z = 0.5 to z ^ 5. The SDSS Catalog will be of order 300 times larger than 
the Vanden Berk et al. catalog; furthermore, it will be a homogeneous catalog with fixed selection 
and detection criteria for the entire sample. Also, the density (number of QSOs per solid angle on 
the sky) of probing lines of sight will be of order 300 times higher. 

Because the SDSS Catalog will have much greater density of lines of sight than the catalog 
analyzed here, we should expect a much larger fraction of the information about clustering of 
absorbers from pairs across different lines of sight. Using simulations, we have estimated how much 
smaller the standard errors of the estimators will be in the 300-times larger SDSS Catalog; in 
particular, we have investigated how the standard errors will be reduced by using the full three- 
dimensional estimator and all the cross-line-of-sight information in the 300-times denser SDSS 
Catalog. 

To do this, we have made simulations of randomly placed absorbers and examined how the 
standard errors of the estimators change if the lines of sight are present at an intensity comparable 
to that which will be achieved in the SDSS Catalog. We define a region of space similar to that to 
be probed by the QSO lines of sight of the SDSS Catalog: that section of a cone with half-angle 
of 45° and Earth at its tip, which is bounded by comoving distance 2000 < r < 3300 Mpc 
(corresponding to redshift 1.25 < z < 4) from Earth. Lines of sight are placed randomly in this 
region of space, with a uniform distribution of comoving lengths between 250 and 450 Mpc 
similar to that in the Vanden Berk et al. catalog (Quashnock & Stein 1999). Mock catalogs 
were created, with total number, m, of lines of sight equal to 100, 1000, 10,000 and 100,000, and 
absorbers randomly placed on all these lines with the same average number of absorbers per unit 
comoving length as that observed in the Vanden Berk et al. catalog. The catalogs were generated 
by simulating a one-dimensional Poisson process on the lines of sight. A set of 10,000 of these 
unclustered mock catalogs were made for each m except for m = 100,000, for which 100 mock 
catalogs were made. 

The variances of the reduced second moment estimators K^^ (r) and K{r) (the average, or 
expectation value, is, in all cases, very near the true value of ^irr^) were computed for each m, for 
5 < r < 300 /i-i Mpc. We find, not surprisingly, that the standard error of -ftTy (r) decreases with 
the total number of lines m as however, this reduction in standard error is constant over 

r. For K{r), there is an additional reduction for larger r. In Figure 5, we show the ratio of the 
standard error of K{r) for m = 1000 {short-dashed line), 10,000 {long-dashed line), and 100,000 
{solid line), to that of K{r) for m = 100. As the number of lines of sight increases, there is a 
continued reduction in the standard error for larger distances, due to the additional and relatively 
more important number of absorber pairs from across different lines of sight. 

This is displayed more dramatically in Figure 6, where we show the relative improvement, or 
ratio, of the standard errors of K{r) to K\\{r) for m = 100 {dotted line), 1000 {short-dashed line), 
10,000 {long-dashed line), and 100,000 {solid line). With 100,000 lines of sight, using K{r) instead 
of K\\ (r) results in an additional factor of 2 to 20 reduction of the standard error on scales of 30 to 
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200 h Mpc, in addition to the factor of v300 reduction from the larger sample size, effectively 
increasing the sample size by an extra factor of 4 to 400 at large distances. 

It is the line density (number of lines of sight per solid angle on the sky, or the number of 
QSOs per square degree) that determines the relative efficiency of K{r) to K\\{r), not the total 
number of lines m. This is also shown in Figure 6, where we show the ratio of the standard errors 
of K{r) to -f^||(r), this time for m = 100 lines of sight, but where the angular density of lines is 10 
times {short-dashed and dotted line) and 100 times {long-dashed and dotted line) higher than the 
initial density; i.e., the solid angle of the conic region described above is 10 and 100 times smaller, 
respectively. Figure 6 shows that increasing the density of lines by a given factor has the same 
effect on the relative efficiency of K{r) to i^y (r) as does simply increasing the total number of lines 
of sight by that same factor. Of course, the overall size of the standard errors is governed by the 
total number of lines m (see Fig. 5). 

These comparisons of the errors in K{r) and K\\{r) arc based on unclustered mock catalogs. 
We are investigating, in ongoing simulations, how different the actual relative improvement might 
be in a catalog of clustered absorbers such as the SDSS Catalog. Using a simple model of voids 
and clusters (Loh 2001) that mimics the correlation structure of the Vanden Berk et al. catalog, 
we have made 1000 clustered mock catalogs with 100 and 1000 lines of sight. We find, for example, 
that for r = 300 Mpc, the ratio of the standard errors of K{r) to K\\{r) is 0.629 and 0.230 for 
unclustered catalogs with 100 and 1000 lines of sight, respectively (see Fig. 6), and 0.669 and 0.270 
for clustered catalogs with the same numbers of lines. This indicates that the relative improvement 
that arises from using cross-line-of-sight pairs in clustered catalogs is still dramatic and is only 
slightly less so (6% and 17% change in the above two cases) than for unclustered catalogs. Thus, 
we expect use of the full three-dimensional estimator to substantially increase clustering sensitivity 
in the SDSS Catalog, with a relative improvement that is only slightly less dramatic than what 
is shown in Figure 6 {solid line). We hope to present more detailed results elsewhere, with much 
larger numbers of lines of sight (Loh 2001). 



5. Conclusions 

We present two estimators, (r) and K{r), for the three-dimensional reduced second moment 
for one-dimensional data (absorber redshifts) along QSO lines of sight. The first estimator uses 
absorber pairs along the same lines of sight, whereas the latter includes data from across different 
lines of sight. We apply our algorithm to a sample of 345 C IV absorbers with median redshift 
{z) = 2.2, from the spectra of 276 QSOs, drawn from the catalog of Vanden Berk et al.. 

We confirm the existence of significant clustering of C IV absorbers on comoving scales up to 
100 Mpc {qo = 0.5), and find that the additional cross-line-of-sight information strengthens 
the evidence for clustering on scales from 100 h^^ Mpc to 150 h^^ Mpc. This argues against claims 
that all the apparent clustering on these scales is due to significant contamination along the line of 
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sight by absorbers that are actually physically associated with the QSO. However, the limited size 

of that catalog, as well as its heterogeneity, precludes a final, strong statement of the statistical 
significance and amplitude of the clustering on scales ^ 100 h^^ Mpc. Also, the sample is too 
sparse to significantly compare clustering along and across lines of sight on scales of less than 100 
Mpc. There is no evidence of absorber clustering along or across lines of sight for scales from 
150 h-^ Mpc to 300 h-^ Mpc. 

We show that with a 300-times larger catalog, such as that to be compiled by the Sloan 
Digital Sky Survey (100,000 QSOs), use of the full three-dimensional estimator and cross-line- 
of-sight information will substantially increase clustering sensitivity. We find that standard errors 

are reduced by a factor 2 to 20 on scales of 30 to 200 Mpc, in addition to the factor of a/SOO 
reduction from the larger sample size, effectively increasing the sample size by an extra factor of 4 
to 400 at large distances. Thus, use of the full three-dimensional reduced second moment estimator 
will significantly advance our ability to describe and analyze large-scale clustering of absorbers, 
and hence visible matter, from the SDSS Catalog. 

We wish to acknowledge Don York, Dan Vanden Berk, and all their collaborators for compiling 
the extensive catalog of heavy-element absorbers used in this study. We wish to thank Massimo 
Mascaro and Ken Wilder for the help they provided with the computer simulations we made. This 
work was supported in part by NASA grant NAG 5-4406 and NSF grant DMS 97-09696 (J. M. Q.), 
and by NSF grant DMS 99071127 (M. L. S. and J. M. L.). 

A. Three— dimensional reduced second moment estimators K\\{r), K±{r) and K{r) 

Let L denote the set made up of the m QSO lines of sight, Lj the ith line, the total number 
of absorbers, and A the intensity of the absorber center process, or mean number of absorbers per 
unit comoving volume. We use dBs{x,u) to represent a shell of inner radius u, centered at x and 
with thickness s, vr{.) to indicate measure in R dimensions and #{.} the number of elements in 
a set. When summing over absorber pairs, we use to represent a sum over pairs on the same 
lines of sight only, a sum over pairs across lines of sight only and ^ a sum over all pairs. 

We have to give the absorbers some physical size so that there is a non-zero probability of 
intersection between an absorber and the lines of sight. For simplicity, we assume the absorbers 
are balls of identical radius d. The clustering we seek to measure is the clustering of the point 
process of the centers of absorbers. Although we do not need to specify d, our method requires 
an approximation that is accurate when d is much smaller than the distances over which we are 
interested. Define Q = 7rd^t'i(L), where vi{L) is the total length of the lines, so Q is effectively the 
volume of space within which we can observe the center of an absorber. 

Throughout we assume that K{r) is continuous in r. To estimate K{r), we first estimate 
X^QK{r) and then divide by an estimate of X^Q. Estimating X'^QK{r) involves taking each absorber 
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in turn and counting the number of absorbers within a distance r. Suppose we have an absorber 

observed at x on some Une of sight. Let another absorber be observed at y on a possibly different 
hne of sight, with |x — y| < r. Its center must he within d of y. There are generally many other 
absorbers within r of a; that are not observed simply because they lie too far away from the lines 
of sight. To take into account this edge effect each absorber pair (x, y) is given a weight. 

To demonstrate how appropriately chosen weights deal with the problem of edge effects when 
estimating \^QK{r), we first show that equation (4) of §2.2 holds. Define l(o,r](^) = 1{0 <u <r} 
and denote the empty set by (j). When observations are in a contiguous window A with volume a 
and 5Ho(x, r) P[ A ^ (f) ior all x e A, 



E 



X] ho,r]{\x - y\)w{x, \x - y\) 

■- X' 



II I 1(qj.Au)\a{x)1a{x + {u,Q?j)w{x,u)u'^dQ.dK{u)dx 

JaJo JdBo(0,l) 

p f'OO 

\^ I I lioAu)lASx)'^T^^'^dK{u)dx 

J A Jo 

X^a f 4Tru^dK{u) 
Jo 

\^aK{r) 



(Al) 

(A2) 
(A3) 

(A4) 

(A5) 



where in equation (A2), {u, 0) arc the spherical coordinates oi y — x. In equation (A3), A^ is the 
set of points in A such that dBQ{x, u) r\ A ^ (j), which is simply A when dBo{x, it) n ^ / for all 
X e A. 

The step from equation (A3) to (A4) holds only for u less than the circumradius of A. Ohser 
(1983) suggested adding a factor to the estimator so that this step to (A4) is valid at larger distances. 
This factor is simply the ratio of the volumes of A and A^. This extension is not of much practical 

value when A is a single contiguous region, but it is critical for the line-of-sight catalog, since we 
would otherwise be restricted to estimating K at distances at most one half the shortest line of 
sight in the catalog. 

Equations (A1)-(A5) demonstrate that estimating X^aK{r) involves taking shells dBdu{x-,u) 
with u < r for each point of the process x & A, counting and weighting the number of other points 
in these shells and integrating over u. We now seek to mimic this procedure for absorbers observed 
along lines of sight. We first consider the simpler case of -^||(r), in which only pairs of absorbers 
along the same line of sight are counted. Define L{x) to be the line on which the absorber at x 
lies. For each pair {x,y) lying on the same line and less than r apart, we set 

w{x,\x-y\) = 4d~'^\x-yfdu/vi{dBdu{x,\x-y\)nL{x)) (A6) 

= 4d-2|x-y|Vq| 

where Cy = C||(x, \x — y\) = i^{dBo{x, \x — y\) n L{x)}. In the denominator of equation (A6), L{x) 
is used and not L, since only absorber pairs on the same line of sight are considered. Thus Cm takes 
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the value 1 or 2. The estimate of }?QK{r) using only absorber pairs on the same line is then 

where = U^-^{x G Lk : dBQ{x, \x — y\) H 7^ <j)}) is the subset of L containing points that 

are a distance \x — y\ from at least one other point on the same line. Ohser's extension is the factor 
iTcPv\{Au _ |)/(5, the proportion of such points in L. Taking N{N — l)/(5 to be the estimate of 
X^Q, we obtain 

l(o,.](|x-y|)4|x-y|2 q2 



^11 (r) = J2 



N{N-l)7rd^vi{Al_^^] 



y-ll l(o,r]i\x - y\)iTr\x - y\'^vi{L)^ 
^-^y C\\N{N-l)v,{A^l_^^) 

We now show that the estimator of \^QK{r) using pairs along the same line of sight has an 
unbiasedness property similar to the one found in (A1)-(A5) as c? — >^ 0: 

7r3Jo ^Bo(o,i) «^'-^|| ird^vi^Ai) 
= ^ <3 / / )r^dK{u)dx AlO 

= X^Q -^{T:d^C\\+o{d^))dK{u) (All) 
= \^QK{r) + o(d2) = \^vi{L)T^d^K{r) + o(d2). (A12) 

Note that when an absorber is observed at x on some line, its center need not be on the line. 
In fact, this occurs with probability zero. All we can infer is that the center is nearby, at most 
distance d away. Accordingly, we take 1 y {x) to mean that an absorber center is located so that 

the center of its interval of intersection with a line of sight is in Alu , and thus the integral over x 
in (AlO) yields ■Kd'^vi{ALu) rather than vi{Alu). Since we do not observe exactly where the centers 
of the absorbers are, all of our estimates of K{r) have some small inherent uncertainty that does 
not disappear as the size of the observation region increases. Specifically, as the observation region 
grows, our estimates of K{r) converge to some average of K(u) for n G [r — d, r + d]. This does not 
pose a problem whenever r is much greater than d. 

We next derive a similar expression for ^^(r), the estimator for K{r) using only across-line- 
of-sight information (see §2.2 for a discussion of its validity). For any absorber x, we consider 
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absorbers y that lie within a distance r on a different line of sight. Both the assigned weight and 
Ohser's extension have to be changed. Define the set Sx = Sx{\x — y\) = dBo{x, \x — y\) n {L\L{x)), 
the set of intersections between dBo{x,\x — y\) and L\L{x). For the absorber pair {x,y), the 
assigned weight is 

w{x,\x-y\) = 4:d~'^\x-y\'^du/vi{dBdu{x,\x-y\)r\{L\L{x))) 
= Ad-^\x-y\yC^ 

where C± = C±{x,\x — y\) = X^5g5^(cos ^s)""^ and 9s is the angle between the line of sight on 
which s lies and the line joining s and x. Figure 7 shows why the factor of (cos^^c,.)^^ is used. 
Referring to Figure 7, note that although absorber y is observed on line of sight k, the intersection 
between dB(iu{x,u) and line of sight i is included in the computation of the weight, since the 
weight is inversely proportional to the volume of the region in which the appearance of an absorber 
center would yield an observed location of the absorber in the shell of radius u and thickness du 
centered at x. For an estimator that uses only across-linc-of sight information, the intersection 
between dBduix,u) and L(x), the line containing x, is not taken into consideration. The estimate 
of X^QK{r) is 

^± l(o,r](k- j/|)4|x- j/p Q 

^-^y <PC^ ■ 'Kd-'vMtx-y\) 

where A.-^x-y\ = ^fc^ii^ ^ ■ dBo{x, \x - y\) n {L\Lk) / (j)}. Ohser's extension is 7rd'^vi{A-l^_y^)/Q 
and is the proportion of points in L that are a distance |a; — y| from at least one other point on 
another line. This yields as an estimate of K{r), 

With appropriate changes, i.e. and Au replaced by C± and A^, steps (A8)-(A12) hold for 
K±{r) — Kj_(rQ), where tq is the shortest distance between different lines of sight in the catalog. 

The estimate of K(r) which uses all absorber pairs is now not difficult to obtain. This estimate 
is not simply a sum of i^||(r) and K±(r). It is true that it involves a sum of all absorber pairs, 
both along and across lines of sight. However, in each term of the sum, the expression for Ohser's 
extension and the assigned weight are different. For absorber pair {x,y), the probed volume that 
contributes to the weight is now a sum of that probed by L{x) and L\L{x): w{x, \x — y\) = 
Ad~'^\x — yp/ (C|| + Cj_). The set of points with at least one other point \x — y\ away is now a union 
of the two sets which is A^^-y]- With these adjustments, we have 

^ ^ l(o,.i(|x-j,|)4vr|x-y|V(L)^ 

Even when using K{r), the fraction of volume of dBdu{x, u) probed by the absorber catalog is 
very small and thus the weights are always much larger than 1. Nevertheless, even in the moderate 
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size catalog used here, there is sufficient information to enable us to obtain useful estimates of the 
three-dimensional reduced second moment measure at large distances. 
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Fig. 1. — Example of an unobserved point {open dot) within distance r of another point z. Also 
shown is an observed point y within distance r of x. This point is given a weight w{x, \x — y\) equal 
to the reciprocal of the fraction a/27r of the circle that is contained within the sample region A. 
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Fig. 2. — Estimates of the reduced second moment measure, K\\{r) {dashed line) and K{r) {solid 
line), divided by their Poisson expectation |7rr^, together with the latter's approximate 95% con- 
fidence region {dotted line: sec text), for the 276 QSO hnes of sight, containing a total of 345 C IV 
absorbers obtained from the Vanden Berk et al. catalog. 
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Fig. 3. — The 95% regions of variation of -^||(r) {dashed line) and K{r) {solid line), divided by 
their Poisson expectation value, |7rr^, for 10,000 simulated data sets of unclustered absorbers with 
the same total number of lines and average number of absorbers as the Vanden Berk et al. catalog. 
The averages for both estimators {dotted lines) are very near their expectation value of unity. 
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Fig. 4. — Average volume-weighted correlation functions (solid lines) ^(r — 25, r + 25) (top panel), 
C±(r — 25, r + 25) (middle panel), and (r — 25, r + 25) (bottom panel), for 100 < r < 300 Mpc, 
for the 276 QSO lines of sight and 345 C IV absorbers of the Vanden Berk et al. catalog. Their 
approximate 95% regions of variation are also shown (dashed lines; see text). 



- 22 - 




Fig. 5. — Ratio of the standard errors of K{r) for m = 1000 {short- dashed line), 10,000 {long-dashed 
line), and 100,000 {solid line), to that of K{r) for m = 100, for mock unclustered catalogs. Note 
the continued reduction in the standard error on larger scales. 
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Fig. 6. — Ratio of the standard errors of K{r) to K\\{r) for m = 100 {dotted line), 1000 {short- 
dashed line), 10,000 {long-dashed line), and 100,000 {solid line), with mock unchistered catalogs. 
Also shown is the same ratio, for m = 100 lines of sight, but where the angular density of lines is 
10 times higher {short-dashed and dotted line) and 100 times higher {long-dashed and dotted line). 
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Fig. 7. — Schematic showing how to obtain the weights for K±{r) and K{r) in a two-dimensional 
setting. Absorbers are observed at x and y, and are separated by distance u = \x — y\ < r; d is 
the assumed radius of all absorbers. For K±{r), the weight w{x,y) is inversely proportional to the 
sum of the volumes of the two singly hatched regions. Note that the thickness of the shaded region 
through line of sight i is dr/ cosOg- For K{r), the weight w{x,y) is inversely proportional to the 
sum of the volumes of the three hatched regions. 
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Table 1. Number of absorber pairs as a function of pair separation r 
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